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Abstract 

We propose a new approach to simulate hypothetical physics processes which are 



^ ■ defined by multiple free parameters. Compared to the conventional grid-scan 

I ■ approach, the new method can produce accurate estimations of the detector 

Q^ , acceptance and signal event yields continuously over the parameter space with 



fewer simulation events. The performance of this method is illustrated with two 



CS| ■ realistic cases. 

>■ 
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1. Introduction 

As data collected by experiments at the Large Hadron Collider (LHC) is 
rapidly increasing, many efforts have been devoted to searches for various Be- 
yond the Standard Model (BSM) physics at the energy frontier. Many BSM 
models are defined by multiple free parameters, such as the masses of hypothet- 
ical particles and their coupling constants, which are to be constrained by data. 
The various cross sections and branching ratios, as well as detector acceptance 
and signal event yields, depend on the parameters. Consequently, the search 
results for BSM physics are usually expressed in terms of excluded regions in 
the multi-dimensional parameter space. 
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In most BSM searches, the hypothetical signal processes are studied in the 
"grid-scan" approach, i.e. Monte-Carlo (MC) samples are simulated with spe- 
cific values of the parameters, corresponding to discrete points on a grid in the 
parameter space. Each MC sample generally contains a considerable number of 
events for adequate statistical precision. The signal yields are estimated from 
MC results of each sample and interpolated between these grid points. However, 
to produce many samples with full-detector simulation is computationally ex- 
pensive, especially if sufficient granularity in the multivariate parameter space is 
desired. It suffers from the so-called "curse of dimensionality" , i.e. the number 
of grid points needed increases exponentially with the number of free param- 
eters. Usually, the grid-scanned parameter space is limited to two dimensions 
at a time, keeping the other parameters fixed. Moreover, the grid is often con- 
structed sparsely. As a result, the search limits derived are subject to systematic 
uncertainties from the interpolation which have to be estimated. 

In this paper, we propose a new method to achieve continuous prediction 
of signal yields over the entire parameter space, under the assumption that the 
acceptance is a smooth function of the parameters. With such an assumption, 
the simulated events on neighboring space points are no longer treated indepen- 
dently, therefore fewer events are needed in total. 

This paper is organized as follows. In section [5] we describe the technical 
procedure of this method. Then we illustrate the performance of the approach 
in section |31 with a 2-D example which simulates the production of the hypo- 
thetical right-handed W boson and heavy Majorana neutrino in the Left-Right 
Symmetric Model (LRSM). Another 4-D example of a simplified SuperSymmet- 
ric (SUSY) model is demonstrated in section SI to highlight the advantage in 
higher-dimensional parameter space. In section [SJ we discuss potential improve- 
ments that can be implemented in various special scenarios. 



2. Continuous simulation 

In order to estimate the signal yield, two parts of information are needed 
from MC simulation. One is the production rate of the considered process 
(denoted as a), which includes the cross-section and the branching ratio. This 
can normally be obtained at the event generation level and is relatively light- 
weight from a computational point of view. The more relevant part is the 
detector acceptance e, namely the probability for the events to be successfully 
reconstructed and selected. This is related to the detector coverage and the 
reconstruction efficiency, for which many MC events with parton showers and 
full detector-simulation are needed for a reliable estimation. The signal yield 
(N) is the product of these two components and the integrated luminosity (£), 

N = aeC (1) 

Considering the fact that detector simulation is the dominant component of 
the computing requirement, it is crucial to reduce the statistics demand of the 
acceptance study. In the grid-scan method, this is achieved by limiting the grid 
size in the parameter space and correspondingly the number of MC samples. 
However, a considerable number of events are still required within each sample, 
in order to evaluate the acceptance e with sufficient statistical accuracy. 

Contrary to the grid-scan method, the continuous simulation technique sim- 
ulates events over more parameter space points, either stochastically distributed 
or on a grid of very fine granularity. The overall number of MC events needed 
is reduced by generating only a few events at each selected parameter space 
point. If we treated each point independently, little statistical precision on the 
determination of e could be achieved with those few events alone. However, 
under the assumption that e changes smoothly with respect to the parameters 
X, the MC events on neighboring points help to constrain e. This is realized by 
fitting the function e(x) using a Bayesian Neural Network (BNN) [l]. 

Neural Network (NN) is a very powerful technique for multivariate pattern 
recognition as it can handle multi-dimensional non-linear data without suffering 



much of the "curse of dimensionahty" . In the past decade, it has been exten- 
sively used in the field of Particle Physics. In most applications it was used 
to construct a multivariate discriminant, with the aim of separating signals 
from backgrounds [2|-|4|. On the other hand, NN has long been recognized as 
a universal approximator for multivariate functions, as long as it has sufficient 
neurons p, |6| . This functionality as a non-parametric regression tool has also 
been exploited by physicists in recent years t7"9]. 

Our usage of BNN to fit the acceptance distribution is also a regression 
application. It differs from previous regression applications in that the output 
of our function e(x) is a probability taking values between and 1, and that our 
input samples are MC events with Boolean target values to indicate whether 
the event is selected or not. The grid-scan approach needs to accumulate many 
MC events at each single point before it can evaluate e to a good precision. 
Instead, the BNN algorithm we used here [10| estimates the whole probability 
distribution in an unbinned manner. It can achieve similar precision to the 
grid-scan method with much fewer MC events. 

This unbinned regression is carried out as follows. For each generated MC 
event, the corresponding parameters of the model x are passed to the BNN as 
the input variables. In addition, a target value i of 1 or is assigned to each 
event, depending on whether it survived the selection cuts. Then the BNN 
is fitted (or "trained") to have its output 2/(x) approximating the probability 
distribution e(x) by minimizing the cross-entropy cost function 

CE = J2i-tk logy(xk) - (1 - tk) log(l - 2/(xk))) (2) 

fc 

where k loops over all simulated events. The cross-entropy function is derived 
from Bernoulli likelihood and naturally attributes the meaning of success prob- 
ability to the output, y. 

The trained BNN can then be used to estimate e(x). Given any point x in 
the parameter space, the BNN will output the estimated value of the acceptance. 
Multiplying it to the production rate cr(x) estimated from MC generators and 
the integrated luminosity, we obtain the corresponding signal yield at any x. 



The Bayesian implementation of the NN provides not only the fitted value 
but also the estimation of the uncertainty for each predicted e(x) [lO|. This 
uncertainty estimation reflects the statistical fluctuation of the input events as 
well as the goodness of BNN fitting. It is based on the same training sample so 
that no additional simulation and computing resources are needed. 

3. Demo: 2-D Left-Right Symmetric Model 

In the following we will demonstrate the application of the continuous MC 
technique by simulating the hypothetical production of a right-handed W boson 
(Wr) which subsequently decays into a heavy Majorana neutrino {Nn) as pre- 
dicted in the Left-Right Symmetrical Models (LRSM). This is a typical BSM 
model searched for at the LHC JTi] . Two leptons are expected in the final state, 
as shown in figure [TJ 




Figure 1: The Feynman diagram of Wu production in the LRSM model. The right-handed 
W boson couples with a lepton and a hypothetical Majorana neutrino, which subsequently 
decays into another lepton and two jets. 



The masses of the two hypothetical particles, M{Wr) and M{Nf;), are gen- 
erally considered as the free parameters of this model. These values affect not 
only the cross-section of the process, but also the kinematics used in the event 
selection. In this example, we considered the parameter space region with Wr 
mass between 0.5 TeV and 1.5 TeV, which is being actively studied at the LHC 
experiments. The Nr mass was limited to be smaller than the Wn mass in 
order to accommodate the concerned decay mode. 



The Pythia generator [1^ is used to simulate this process. For simphcity, 
detector effects are not inchidcd in this demonstration. Only the acceptance 
e(x) at the generator level is studied. From the above-mentioned mass ranges 
we picked 100,000 points stochastically, which are dense enough to cover the 2D 
parameter space. Then 10 MC events were generated at each point so that we 
have a total of 1 million events simulated for this demonstration of continuous 
MC. The reason for generating multiple events at each point is to reduce the 
overhead of generator re-initialization and the computing load of BNN training. 
However, we still maintain a relatively low number of events at each point in 
order to spread over as many space points as possible. 

The event selection we assumed requires at least two leptons (e,/i) with 
transverse momentum greater than 20 GeV and pseudo-rapidity t] between - 
2.5 and 2.5. A 100% fiducial efficiency is assumed for lepton identification. In 
addition, a common isolation cut for background suppression is applied, requir- 
ing the transverse energy fiow of all interactive particles inside a cone of size 



AR ^ ^(A77)2 + (A(/))2 = 0.2 around the lepton to be less than 10% of the 
lepton's transverse momentum. 

The acceptance of this event selection is fitted by a BNN with forty neurons 
in the first hidden layer and ten neurons in the second hidden layer. This 
topology is arbitrarily chosen, merely to provide redundant complexity within 
the computing budget. The BNN can automatically constrain the complexity 
and avoid over-fitting by a regulator mechanism, which is tuned based on the 
Bayesian evidence framework jll. Il0|. The fitted distribution of e(x) is given in 



figure 2(a) 



To test the performance, we also simulated another set of discrete samples by 
the grid-scan approach. The values of M{Wr) are from 600 GeV to 1400 GeV 
with 100 GeV steps, and the values oiM{NR) are from 50 GeV to (M(VKr)-50) 
GeV with 50 GeV steps. A total of 171 samples have been simulated with these 
points, each containing 100,000 events for statistical accuracy. The acceptance 



function evaluated from these samples is shown in figure 2(b) 



We then compared the e(x) values fitted by the BNN at these grid points to 
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Figure 2: The acceptance distribution fitted by BNN (a) and estimated by the grid-scan 
samples (b). 



the reference values directly estimated from the discrete samples. The relative 
deviations are shown in figure [31 For most points, the deviation is well within a 
few percent of the reference value, which is negligible compared to the systematic 
uncertainties of simulation. Notably, the relative deviation becomes larger when 
M{Nft) approaches to 0. This is partly due to the smaller value of e(x) itself in 
this region. In addition, when approaching the edge of the parameter space of 
the training sample, the lack of neighboring points increases the uncertainty of 
the fitting. 

As mentioned in section [2l the BNN also estimates the uncertainties of the 
fitted probabilities, denoted as cr7VAf(x). The uncertainties account for the de- 
viation we observed, both in the central region with stable e(x) and at the edge 
region where M{Nfj) approaches 0. By comparing the deviation between the 
fitted value and the reference value to the BNN estimated uncertainty (figure S]), 
we can see good consistency over the entire parameter space. 

Only 1 million MC events were used by our example of continuous simulation. 
For the grid scan technique, a total of 17.1 million events at 171 discrete points 
were used to build a typical set of samples in the same parameter space region. 
To quantitatively compare the statistics demand, we calculated the number of 
events at each grid point that would yield a statistical uncertainty equal to 
the BNN uncertainty at the same point. It is worth mentioning that the BNN 
uncertainty includes not only the statistical fluctuation, but also the uncertainty 
on the goodness of fitting. This comparison shows, ignoring the interpolation 
errors, a total number of at least 6.3 million events for these 171 grid-scan points 
are needed in order to reach a similar precision as the 1 million continuous MC 
samples do. 

As mentioned above, the BNN uncertainty includes the uncertainty on the 
goodness of fitting. The counterpart in the grid-scan approach is the interpola- 
tion uncertainty when estimating the acceptance at any point not on the grid. 
To evaluate this uncertainty for the grid-scan method, we assumed that the 
value at the interpolated point is uniformly distributed between the values at 
the neighboring grid points in each dimension. The uncertainty is estimated as 
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Figure 3: The relative deviation of the BNN predictions from the estimations by grid-scan, 
at the selected parameter space points (a), and the distribution of these deviations (b). 
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Figure 4: The deviations between the BNN predictions and the estimations by grid-scan, 
divided by the BNN uncertainties, at the selected parameter space points (a), and the distri- 
bution of these ratios (b). 



10 



the standard deviation of this distribution, i.e. 1/v 12 of the difference between 
the two values. We averaged the upward and downward uncertainties (if they 
exist) to get an uncertainty for each grid point. This uncertainty of grid-scan 
is compared to the BNN uncertainty of continuous simulation, as shown in fig- 
ure [5l We can see that the interpolation could often introduce a non-negligible 
uncertainty, especially in the low Nn mass region where the acceptance changes 
steeply. To achieve similar interpolation precision as the BNN fitting, the grid- 
scan will need finer granularity and more statistics, as well as extra effort for 
optimization of the grid structure. 

4. Demo: 4-D Simplified SuperSymmetric Model 

In this section, we will illustrate the performance of the continuous simula- 
tion in a 4-D scenario. This is gluino pair production with a two-step cascade de- 
cay in one of the Simplified SuperSymmetric Models [l3|, |l4| . In this model, the 
gluino (g) can decay to the Lightest SuperSymmetric Particle (LSP), assumed 
to be the Xi, through the decay chain of 5 — > qqXi — > qqW^X2 — ?> qqW^ZXi. 
The BSM parameters in this model are the masses of the four SUSY particles. 



For this demonstration, we studied the mass range of g between 200 GeV and 
1200 GeV. To accommodate the decay chain concerned, the mass of the Xi 
was limited to between 200 GeV and M(g), and the mass of the X2 was limited 
to between 100 GeV and I 
between GeV and M(X2) 



to between 100 GeV and M(Xi ). Finally, the mass of the LSP was limited to 



The Herwig-1--|- generator |15| was used to simulate this process. For the 
continuous MC sample, a total number of 1 million events were generated at 
100k space points, stochastically distributed in the parameter space region. For 
simplicity, the study is again limited to generator level acceptance. We looked 
for final states with at least two leptons (e,/i) with transverse momentum greater 
than 20 GeV and with pseudo-rapidity 77 between -2.5 and 2.5. A 100% fiducial 
efficiency was assumed for lepton identification. In addition, the transverse 
energy flow of all interactive particles around the lepton within a Ai? = 0.4 
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Figure 5: The interpolation uncertainties evaluated by comparing to nearby points, in the 
directions of M{Vl^ij) (a) and M{Nii) (b) respectively. 
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cone was required to be below 5 GeV. As a typical SUSY signature, we also 
required the visible energy imbalance in the transverse direction to be greater 
than 50 GeV. The acceptance of this di-lepton plus missing transverse energy 
selection was then parameterized by a BNN of the same topology as in the 
previous example. 

For comparison, we also simulated discrete samples by grid-scan, with 100 
GeV steps in all four directions. The values of M(^) are from 250 GeV up to 
1150 GeV. The values of M(xf ) are from 200 GeV to M(5)-50 GeV. The values 
of M(X2) are from 100 GeV to M(xi^)-100 GeV. And the values of M(LSP) are 
from GeV to M(X2)-100 GeV. These made up a 4-D grid with 715 points, with 
30k events simulated at each point. 

With these 715 grid-scan samples, we obtained reference values of e(x) rang- 
ing from 0.04 to 0.16. Using the BNN trained by the continuous MC sample, we 
observed similar distributions of e(x), as shown in figure [51 Further comparison 
of the BNN estimations and the reference values at each point revealed that 
most of them are consistent, well within 10% relative deviation (fig. [71). 
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Figure 6: The acceptances at 715 grid points, evaluated from the grid-scan samples and the 
BNN fitting. 



We also examined the uncertainty estimated by the BNN, comparing it to 
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the actual difference between BNN estimation and the reference value. From 
figure m we can see that the uncertainties are certainly consistent with the 
actual deviation. 

To demonstrate the statistical efficiency of continuous simulation, we again 
calculated the minimal number of events at each grid point, if statistical preci- 
sion equivalent to the BNN uncertainty is desired. The comparison shows that, 
ignoring the interpolation errors, at least 3.2 million events on the 715 consid- 
ered grid points are needed, which is already greater than the 1 million statistics 
we used for the continuous MC sample. 

The grid-scan suffers more uncertainties of interpolation in a 4-D space. 
With the same measurements as in section [31 we estimated the interpolation 
uncertainties on each of the four parameters as well as their quadratic sum 



(fig. 9(a) I. Notably, in each direction, due to the constraint of the mass hier- 
archy, 55 out of the 715 points cannot have this interpolation uncertainty mea- 
sured, because they have no neighboring points. Overall there are 189 points 
whose interpolation uncertainties are underestimated due to this. Nevertheless, 
comparing them with the BNN uncertainties, we can see that the interpolation 
errors are worse than the BNN uncertainties at a large fraction of the parameter 
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Figure 8: The deviation between the BNN estimation and the grid-scan reference value, 
divided by the BNN uncertainties. 



space points (fig. |9(b)[ ). This indicates the necessity of a grid with finer granu- 
larity. However, increasing the number of points in each dimension by a mere 
factor of two would already mean 16 times larger statistics: a manifestation of 
the effect of the " curse of dimensionality" . 



5. Discussion 

Several potential improvements beyond what has been demonstrated can 
be made depending on the subject under study. For example, the choice of 
parameter space points can be more intelligently informed. In our examples, 
the parameters x are chosen from the space with a flat probability distribution. 
However, after the first round of acceptance fitting, we may identify certain 
parameter space regions in which the fitted values have relatively larger uncer- 
tainties, either due to the smaller values of e(x) or due to the proximity to the 
edge of the parameter space of the training samples. A typical case is the low 
M{Nfi) region in LRSM production, as shown in section [31 If enhancement of 
precision for these regions is desired, it can be achieved by simulating additional 
events in these regions. Fitting another BNN with the enlarged sample will then 
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Figure 9: The interpolation uncertainties on all fours axes and their quadratic combination, 
comparing to the reference value of acceptance (a) and the BNN uncertainty (b). 
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give more accurate result for such regions. 

In case it is impractical to produce additional MC events, an alternative 
solution is to fit another BNN with the existing sample, but assign higher weights 
during the fitting to those events in the concerned regions. This is equivalent 
to repeatedly using these events as additional samples. Although the statistical 
fluctuation remains, it will drive the fitting to focus more on the numerical 
feature of these regions and rely less on the interpolation from other regions. 

Another potential improvement is to split the parameter space according to 
different physics processes. The BNN fitting assumes smooth distribution over 
the parameter space. However, this assumption of smoothness may not hold 
at boundaries where the physics is different on each side. If such boundaries 
are known as a priori knowledge, independent BNNs should be applied on each 
sub-set of the parameter space within which the signal yield is smooth. 

Besides fitting the acceptance e(x), the production rate cr(x) can also be 
fitted by the BNN. This might be necessary if a continuous function of cr(x) 
is desired or if calculating cross-sections for a large number of parameter space 
points is impractical. The way of BNN fitting needs to be changed slightly 
for such a regression application. The NN output y{x.) should be allowed to 
range over all real numbers, while the target value will be the (T(xi) for each 
event. The cost function will be the Mean-Square-Error between y(xi) and 
cr(xi), representing a Gaussian likelihood. As a result, the signal expectation 
iV(x) can also be obtained continuously over the parameter space. 

6. Conclusion 

In this work, we introduced a new approach to simulate hypothetical pro- 
cesses with multiple free parameters and its use in estimating signal acceptance 
and yields continuously over the parameter space. In particular, the BNN tech- 
nique is used for unbinned fitting of the acceptance, which is the key to pro- 
viding continuous estimation and reducing the required number of simulation 
events. The two examples we showed both suggest that the acceptance can be 
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well estimated with fewer MC events than needed in the grid-scan approach. 
In addition, the uncertainty of the fitted signal yield is estimated by the BNN 
simultaneously which is a natural consequence of the Bayesian inference. 
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